A Quantification of Metagame Shifts in Professional League of Legends Gameplay¶
Name(s): Jason Tran
Website Link: https://dsc80.enscribe.dev, https://jktrns.github.io/league-metagame-analysis
import logging
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Tuple
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.stats import permutation_test
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import FunctionTransformer, LabelEncoder
from dsc80_utils import *
pd.options.plotting.backend = "plotly"
logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)
Step 1: Introduction¶
The data we are provided is from Oracle's Elixir, a historical database and analytics provider for the esports scene within League of Legends. This site is utilized by both professional analysts and community enthusiasts alike, and provides comprehensive match data across nearly all major leagues and competitions internationally.
The provided dataset from Oracle's Elixir consists of .csv files, where each .csv represents one year of match data. The data covers matches from 2014 up to present day (the 2024 set updates incrementally on a daily basis). Loading in a single game from the 2024 dataset:
display_df(
pd.read_csv(
"data/2024_LoL_esports_match_data_from_OraclesElixir.csv",
low_memory=False,
).head(12)[
[
"gameid",
"date",
"side",
"position",
"teamname",
"playername",
"champion",
"kills",
"deaths",
"assists",
]
],
rows=12,
cols=10,
)
| gameid | date | side | position | teamname | playername | champion | kills | deaths | assists | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Blue | top | LNG Esports | Zika | Aatrox | 1 | 3 | 1 |
| 1 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Blue | jng | LNG Esports | Weiwei | Maokai | 0 | 4 | 3 |
| 2 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Blue | mid | LNG Esports | Scout | Orianna | 0 | 2 | 0 |
| 3 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Blue | bot | LNG Esports | GALA | Kalista | 2 | 4 | 0 |
| 4 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Blue | sup | LNG Esports | Mark | Senna | 0 | 3 | 3 |
| 5 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Red | top | Rare Atom | Xiaoxu | Rumble | 4 | 0 | 6 |
| 6 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Red | jng | Rare Atom | naiyou | Rell | 1 | 0 | 12 |
| 7 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Red | mid | Rare Atom | VicLa | LeBlanc | 4 | 0 | 7 |
| 8 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Red | bot | Rare Atom | Assum | Varus | 7 | 1 | 5 |
| 9 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Red | sup | Rare Atom | Zorah | Renata Glasc | 0 | 2 | 13 |
| 10 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Blue | team | LNG Esports | NaN | NaN | 3 | 16 | 7 |
| 11 | 10660-10660_game_1 | 2024-01-01 05:13:15 | Red | team | Rare Atom | NaN | NaN | 16 | 3 | 43 |
The dataset contains 168 columns that detail game-level statistics ranging from basic performance metrics (e.g. kills, deaths, and assists) to very nuanced and advanced analytics not even shown in the table above (e.g. damage share, vision control, creep score), and is invaluable for professional teams to analyze trends and make strategic decisions in gameplay.
Notice that the dataset's structure captures both macro-level game dynamics and individual player performance metrics. Each row represents a player's performance in a single game, with 10 rows per match (one for each player), plus two additional rows for team-level statistics. Key features include temporal data (patch numbers, timestamps for various objectives across the map), economic indicators (gold differences, resource distribution), and performance metrics (damage output, vision score). The data itself is very fine-grained and will allow for sophisticated analysis.
The patch number here really stands out as a key feature, and as a natural progression leads to the question: how can we quantify how the metagame shifts across different patches? As context, "metagame" (colloquially referred to as "meta") refers to the popular strategies and team compositions that provide advantages over other strategies/team compositions at a particular point in time—meta is most often caused by changes in game balancing (e.g. changes to champion abilities, item builds, maps, core game mechanics, etc.) that are introduced in a new patch. Since the dataset also contains regional data, we can ask how meta differs between regions, or how meta from a particular region plays against meta from another. Meta is also more pronounced in professional play, as most player have reached the theoretical skill ceiling and are mainly separated by strategy and composition—in more amateur or casual play, meta is less impactful compared to technical skill and "game sense"/knowledge.
Step 2: Data Cleaning and Exploratory Data Analysis¶
Data Cleaning¶
Since our dataset provides two layers of granularity (the player level and the team level), we can divide the dataset into two separate DataFrames: players and teams:
def load_match_data(years_range: range = range(2014, 2025)) -> pd.DataFrame:
dfs = []
for year in years_range:
try:
df = pd.read_csv(
f"data/{year}_LoL_esports_match_data_from_OraclesElixir.csv",
low_memory=False,
)
dfs.append(df)
except FileNotFoundError:
continue
return pd.concat(dfs, ignore_index=True)
def split_player_team_data(
matches_raw: pd.DataFrame,
) -> tuple[pd.DataFrame, pd.DataFrame]:
players_raw = matches_raw[
matches_raw["position"].isin(["top", "jng", "mid", "bot", "sup"])
]
teams_raw = matches_raw[matches_raw["position"] == "team"]
return players_raw, teams_raw
matches_raw = load_match_data()
players_raw, teams_raw = split_player_team_data(matches_raw)
print("Shape of players DataFrame:", players_raw.shape)
print("Shape of teams DataFrame:", teams_raw.shape)
display_df(
players_raw.iloc[:10][
[
"gameid",
"date",
"side",
"teamname",
"playername",
"champion",
]
],
rows=10,
cols=7,
)
display_df(
teams_raw.iloc[:2][
[
"gameid",
"date",
"teamname",
]
],
rows=2,
cols=4,
)
Shape of players DataFrame: (828380, 161) Shape of teams DataFrame: (165676, 161)
| gameid | date | side | teamname | playername | champion | |
|---|---|---|---|---|---|---|
| 0 | TRLH3/33 | 2014-01-14 17:52:02 | Blue | Fnatic | sOAZ | Trundle |
| 1 | TRLH3/33 | 2014-01-14 17:52:02 | Blue | Fnatic | Cyanide | Vi |
| 2 | TRLH3/33 | 2014-01-14 17:52:02 | Blue | Fnatic | xPeke | Orianna |
| 3 | TRLH3/33 | 2014-01-14 17:52:02 | Blue | Fnatic | Rekkles | Jinx |
| 4 | TRLH3/33 | 2014-01-14 17:52:02 | Blue | Fnatic | YellOwStaR | Annie |
| 5 | TRLH3/33 | 2014-01-14 17:52:02 | Red | Gambit Gaming | Darien | Dr. Mundo |
| 6 | TRLH3/33 | 2014-01-14 17:52:02 | Red | Gambit Gaming | Diamondprox | Shyvana |
| 7 | TRLH3/33 | 2014-01-14 17:52:02 | Red | Gambit Gaming | Alex Ich | LeBlanc |
| 8 | TRLH3/33 | 2014-01-14 17:52:02 | Red | Gambit Gaming | Genja | Lucian |
| 9 | TRLH3/33 | 2014-01-14 17:52:02 | Red | Gambit Gaming | Edward | Thresh |
| gameid | date | teamname | |
|---|---|---|---|
| 10 | TRLH3/33 | 2014-01-14 17:52:02 | Fnatic |
| 11 | TRLH3/33 | 2014-01-14 17:52:02 | Gambit Gaming |
After splitting the data into players and teams DataFrames, we notice that some columns contain only missing values. This is because certain columns only pertain to one granularity level but not the other. For example, player-specific columns like 'champion', 'position', and 'playername' will be empty in the teams DataFrame since they only apply to individual players, and team-level statistics like 'firstdragon' and 'firstblood' will be empty in the players DataFrame since they represent aggregate team performance rather than individual stats. We should clean these columns to avoid confusion when performing analysis:
def clean_empty_columns(df: pd.DataFrame) -> pd.DataFrame:
return df.loc[:, ~df.isna().all()]
print("Columns that would be removed from players DataFrame:")
print(list(players_raw.columns[players_raw.isna().all()]))
print("Columns that would be removed from teams DataFrame:")
print(list(teams_raw.columns[teams_raw.isna().all()]))
print(
f"Change in players columns after cleaning: {len(players_raw.columns)} -> {len(clean_empty_columns(players_raw).columns)}"
)
print(
f"Change in teams columns after cleaning: {len(teams_raw.columns)} -> {len(clean_empty_columns(teams_raw).columns)}"
)
Columns that would be removed from players DataFrame: ['pick1', 'pick2', 'pick3', 'pick4', 'pick5', 'firstdragon', 'dragons', 'opp_dragons', 'elementaldrakes', 'opp_elementaldrakes', 'infernals', 'mountains', 'clouds', 'oceans', 'chemtechs', 'hextechs', 'dragons (type unknown)', 'elders', 'opp_elders', 'firstherald', 'heralds', 'opp_heralds', 'void_grubs', 'opp_void_grubs', 'firstbaron', 'firsttower', 'towers', 'opp_towers', 'firstmidtower', 'firsttothreetowers', 'turretplates', 'opp_turretplates', 'gspd', 'gpr'] Columns that would be removed from teams DataFrame: ['playername', 'playerid', 'champion', 'firstbloodkill', 'firstbloodassist', 'firstbloodvictim', 'damageshare', 'earnedgoldshare'] Change in players columns after cleaning: 161 -> 127 Change in teams columns after cleaning: 161 -> 153
Additionally, there are multiple columns in both datasets that are semantically boolean but are stored as either integers or floats. We should convert these columns to boolean type:
def get_boolean_columns(df: pd.DataFrame) -> list:
bool_cols = []
for col in df.columns:
unique_vals = df[col].dropna().unique()
if all(val in [0, 1] for val in unique_vals):
bool_cols.append(col)
return bool_cols
def convert_boolean_columns(df: pd.DataFrame) -> pd.DataFrame:
bool_cols = get_boolean_columns(df)
for col in bool_cols:
df[col] = df[col].astype("boolean")
return df
print("Columns that would be converted to boolean in players DataFrame:")
print(players_raw.pipe(clean_empty_columns).pipe(get_boolean_columns))
print("Columns that would be converted to boolean in teams DataFrame:")
print(teams_raw.pipe(clean_empty_columns).pipe(get_boolean_columns))
Columns that would be converted to boolean in players DataFrame: ['playoffs', 'result', 'firstblood', 'firstbloodkill', 'firstbloodassist', 'firstbloodvictim'] Columns that would be converted to boolean in teams DataFrame: ['playoffs', 'result', 'firstblood', 'firstdragon', 'firstherald', 'firstbaron', 'firsttower', 'firstmidtower', 'firsttothreetowers']
We can do some other miscellaneous cleaning, such as converting the date column to a datetime object, and adding padding to the patch column. We can also add a major_patch column to the players DataFrame (the section of the patch number to the left of the decimal point), which indicates a major separation in the game's mechanics:
def clean_patch_data(df: pd.DataFrame) -> pd.DataFrame:
df = df.copy()
df["date"] = pd.to_datetime(df["date"])
df["patch"] = df["patch"].apply(
lambda x: (
np.nan
if pd.isna(x)
else (
str(x).split(".")[0].zfill(2) + "." + str(x).split(".")[1].zfill(2)
if "." in str(x)
else str(x).zfill(2) + ".00"
)
)
)
df["major_patch"] = df["patch"].str.split(".").str[0].str.zfill(2) + ".X"
df["major_patch"] = pd.Categorical(
df["major_patch"],
categories=[f"{str(i).zfill(2)}.X" for i in range(3, 15)],
ordered=True
)
return df
print("Major patch categories:")
print(players_raw.pipe(clean_patch_data)["major_patch"].unique())
Major patch categories: ['03.X', NaN, '04.X', '05.X', '06.X', ..., '10.X', '11.X', '12.X', '13.X', '14.X'] Length: 13 Categories (12, object): ['03.X' < '04.X' < '05.X' < '06.X' ... '11.X' < '12.X' < '13.X' < '14.X']
Taking a look at the unique values of the datacompleteness column, we can see that there are three categories: "complete", "partial", and "error". The "error" category is likely due to a data collection error:
print(
teams_raw["datacompleteness"]
.value_counts(normalize=True)
.apply(lambda x: f"{x:.4f}")
)
datacompleteness complete 0.8791 partial 0.1204 error 0.0005 Name: proportion, dtype: object
Since the "error" category only accounts for 0.05% of the data, we can safely drop it:
def drop_error_data(df: pd.DataFrame) -> pd.DataFrame:
return df[df["datacompleteness"] != "error"]
We also notice that a lot of the values for columns pick1 through pick5 are missing:
print("Percentage of missing values in pick columns:")
for col, pct in (
teams_raw.pipe(drop_error_data)[[f"pick{i}" for i in range(1, 6)]].isna().mean().mul(100).items()
):
print(f"{col}: {pct:.2f}%")
Percentage of missing values in pick columns: pick1: 24.26% pick2: 24.26% pick3: 24.26% pick4: 24.26% pick5: 24.26%
In professional play, there is a "draft phase" in which the ban and pick order is actually ordinal, and alternates between the two teams with the following system:

The issue is that this dataset covers games from a time before this system was implemented, and as such we have many rows where the pick1, pick2, etc. columns are all missing even though their respective champion entries in the players DataFrame are not. We can't simply use the champion column entries as a fallback either, because the order of champions in the champion column is hardcoded as their role in the game: (1) top laner, (2) jungler, (3) middle laner, (4) bottom laner, and (5) support. Pick order is extraordinarily important in professional play and indicative of the team's strategy and priorities (and as such, a key component of metagame).
To impute these missing values, what we can do is calculate the "presence" of the champion across a particular timeframe surrounding that match (in this case, we will choose the patch number). We can do this by finding the percentage of games in which that champion appeared (as in getting picked or banned) for that patch, and then use that to determine the pick order. Although this is imperfect, it should provide a good enough approximation for our purposes.
Firstly, we will forward fill the patch column based on the chronological order of the matches, and then calculate the champion presence rate for each patch:
def fill_missing_patches(df: pd.DataFrame) -> pd.DataFrame:
return (
df.copy()
.sort_values("date")
.assign(patch=lambda x: x["patch"].ffill())
)
def calculate_patch_importance(teams_df: pd.DataFrame, players_df: pd.DataFrame) -> dict:
players_filtered = players_df[["patch", "champion"]]
patch_importance = {}
for patch in teams_df["patch"].unique():
patch_teams = teams_df.loc[
teams_df["patch"] == patch,
["ban1", "ban2", "ban3", "ban4", "ban5"]
]
patch_picks = players_filtered.loc[
players_filtered["patch"] == patch,
"champion"
]
bans = patch_teams.values.ravel()
valid_bans = bans[~pd.isna(bans)]
total_games = len(patch_teams) / 2
all_champs = np.concatenate([patch_picks.values, valid_bans])
champion_counts = pd.Series(all_champs).value_counts()
patch_importance[patch] = champion_counts / total_games
return patch_importance
For example, here are the presence rates for the top 10 champions in patch 14.22:
display_df(
pd.DataFrame(
calculate_patch_importance(
teams_raw.pipe(clean_patch_data).pipe(drop_error_data).pipe(fill_missing_patches),
players_raw.pipe(clean_patch_data).pipe(drop_error_data).pipe(fill_missing_patches),
)["14.22"]
.sort_values(ascending=False)
.head(10)
)
.reset_index()
.rename(columns={"index": "champion", "count": "presence"})
.set_index("champion"),
rows=10,
)
| presence | |
|---|---|
| champion | |
| Corki | 1.00 |
| Aurora | 1.00 |
| Skarner | 1.00 |
| Ashe | 0.94 |
| K'Sante | 0.88 |
| Yone | 0.71 |
| Vi | 0.65 |
| Orianna | 0.65 |
| Varus | 0.59 |
| Jax | 0.59 |
For each row, we create:
- A
pickscolumn that uses thepick1throughpick5columns if they are not missing—otherwise, we impute it with thechampioncolumn entries from theplayersDataFrame, sorting them by the champion presence rates for that patch as an estimate of their pick priority. - A
banscolumn that is simply the concatenation of whatever is in theban1throughban5columns (it's already ordered so we don't need to sort):
def process_draft_data(
teams_df: pd.DataFrame, players_df: pd.DataFrame
) -> pd.DataFrame:
draft_df = teams_df.copy()
pick_cols = [f"pick{i}" for i in range(1, 6)]
ban_cols = [f"ban{i}" for i in range(1, 6)]
team_picks = draft_df[pick_cols].values
ordered_picks = [[pick for pick in row if pd.notna(pick)] for row in team_picks]
draft_df["ordered_picks"] = ordered_picks
player_picks = (
players_df.sort_values(["gameid", "side", "position"])
.groupby(["gameid", "side"], observed=True)
.agg(player_picks=("champion", list))
.reset_index()
)
draft_df = pd.merge(
draft_df, player_picks, on=["gameid", "side"], how="left", validate="1:1"
)
presence_rates = calculate_patch_importance(teams_df, players_df)
presence_lookup = {
(patch, champ): rate
for patch, champs in presence_rates.items()
for champ, rate in champs.items()
}
def order_picks_by_presence(row):
if len(row.ordered_picks) == 5:
return row.ordered_picks
return sorted(
row.player_picks,
key=lambda champ: presence_lookup.get((row["patch"], champ), 0),
reverse=True,
)
draft_df["picks"] = draft_df.apply(order_picks_by_presence, axis=1)
ban_data = draft_df[ban_cols].values
bans = [[ban for ban in row if pd.notna(ban)] for row in ban_data]
draft_df["bans"] = bans
draft_df = draft_df.drop(
pick_cols + ban_cols + ["ordered_picks", "player_picks"], axis=1
)
return draft_df
players = (
players_raw
.pipe(drop_error_data)
.pipe(clean_empty_columns)
.pipe(convert_boolean_columns)
.pipe(clean_patch_data)
.pipe(fill_missing_patches)
)
teams = (
teams_raw
.pipe(drop_error_data)
.pipe(clean_empty_columns)
.pipe(convert_boolean_columns)
.pipe(clean_patch_data)
.pipe(fill_missing_patches)
.pipe(process_draft_data, players)
)
teams
| gameid | datacompleteness | url | league | ... | opp_deathsat25 | major_patch | picks | bans | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | TRLH3/33 | complete | http://matchhistory.na.leagueoflegends.com/en/... | EU LCS | ... | 10.0 | 03.X | [Annie, Vi, Jinx, Trundle, Orianna] | [Riven, Kha'Zix, Yasuo] |
| 1 | TRLH3/33 | complete | http://matchhistory.na.leagueoflegends.com/en/... | EU LCS | ... | 4.0 | 03.X | [Thresh, LeBlanc, Lucian, Shyvana, Dr. Mundo] | [Kassadin, Nidalee, Elise] |
| 2 | TRLH3/44 | complete | http://matchhistory.na.leagueoflegends.com/en/... | EU LCS | ... | 7.0 | 03.X | [Elise, Lucian, Lulu, Shyvana, Kayle] | [Lee Sin, Annie, Yasuo] |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 165589 | LOLTMNT01_180445 | complete | NaN | NEXO | ... | 14.0 | 14.X | [Wukong, Caitlyn, Nautilus, K'Sante, Taliyah] | [Skarner, Corki, Ashe, Gnar, Ornn] |
| 165590 | LOLTMNT01_181121 | complete | NaN | NEXO | ... | 11.0 | 14.X | [Yone, Poppy, Rumble, Jinx, Vi] | [Zyra, Nocturne, Aurora, Caitlyn, Kai'Sa] |
| 165591 | LOLTMNT01_181121 | complete | NaN | NEXO | ... | 15.0 | 14.X | [LeBlanc, Rell, K'Sante, Aphelios, Taric] | [Skarner, Corki, Ashe, Rek'Sai, Sejuani] |
165592 rows × 146 columns
Finally, to meet the "tidy data" requirement of having one observation per row, we can make it so that each match is represented by a single row instead of two. What we can do is prefix columns that are team-dependent with the team's color (e.g. blue_firstblood or red_teamname):
def format_matches_data(draft_data: pd.DataFrame) -> pd.DataFrame:
base_cols = [
"gameid",
"datacompleteness",
"url",
"league",
"year",
"split",
"playoffs",
"date",
"game",
"patch",
"major_patch",
"gamelength",
]
draft_cols = [col for col in draft_data.columns if col not in base_cols]
blue_cols = {col: f"blue_{col}" for col in draft_cols}
red_cols = {col: f"red_{col}" for col in draft_cols}
blue_teams = draft_data[draft_data["side"] == "Blue"].rename(columns=blue_cols)
red_teams = draft_data[draft_data["side"] == "Red"].rename(columns=red_cols)
matches = (
draft_data[base_cols]
.drop_duplicates()
.merge(blue_teams[["gameid"] + list(blue_cols.values())], on="gameid")
.merge(red_teams[["gameid"] + list(red_cols.values())], on="gameid")
)
return matches
matches = format_matches_data(teams)
matches
| gameid | datacompleteness | url | league | ... | red_opp_assistsat25 | red_opp_deathsat25 | red_picks | red_bans | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | TRLH3/33 | complete | http://matchhistory.na.leagueoflegends.com/en/... | EU LCS | ... | 23.0 | 4.0 | [Thresh, LeBlanc, Lucian, Shyvana, Dr. Mundo] | [Kassadin, Nidalee, Elise] |
| 1 | TRLH3/44 | complete | http://matchhistory.na.leagueoflegends.com/en/... | EU LCS | ... | 16.0 | 6.0 | [Thresh, Renekton, Caitlyn, Gragas, Vi] | [Kassadin, Kha'Zix, Ziggs] |
| 2 | TRLH3/76 | complete | http://matchhistory.na.leagueoflegends.com/en/... | EU LCS | ... | 4.0 | 4.0 | [Renekton, Vi, Leona, Ziggs, Jinx] | [Yasuo, Elise, LeBlanc] |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 82793 | LOLTMNT01_180434 | complete | NaN | NEXO | ... | 24.0 | 19.0 | [Varus, Xin Zhao, Sylas, Gnar, Thresh] | [Skarner, Corki, Seraphine, Gwen] |
| 82794 | LOLTMNT01_180445 | complete | NaN | NEXO | ... | 23.0 | 14.0 | [Wukong, Caitlyn, Nautilus, K'Sante, Taliyah] | [Skarner, Corki, Ashe, Gnar, Ornn] |
| 82795 | LOLTMNT01_181121 | complete | NaN | NEXO | ... | 19.0 | 15.0 | [LeBlanc, Rell, K'Sante, Aphelios, Taric] | [Skarner, Corki, Ashe, Rek'Sai, Sejuani] |
82796 rows × 280 columns
We now have three DataFrames with three levels of granularity we can perform analysis on:
matches: one row per match, with team-specific columnsteams: one row per team per match, with team-specific columnsplayers: one row per player per match, with player-specific columns
Univariate Analysis¶
Professional League of Legends has a roster of 169 champions (with 168 available in professional play) as of November 2024, but only a subset are considered viable in professional play during any given meta. Although a bivariate analysis is more appropriate to explore meta shifts over time, we can still get a sense of the most "reliable" champions by looking at the top 20 most picked champions across the entire dataset. I'll color the bars by role for additional context, although this is a univariate analysis and thus it cannot be used to draw any conclusions:
role_colors = {
"top": "#E57373",
"jng": "#81C784",
"mid": "#64B5F6",
"bot": "#FFB74D",
"sup": "#BA68C8",
}
role_names = {
"top": "Top Lane",
"jng": "Jungle",
"mid": "Mid Lane",
"bot": "Bot Lane",
"sup": "Support"
}
fig = px.bar(
(
players.groupby(["champion", "position"])["champion"]
.count()
.reset_index(name="picks")
.sort_values("picks", ascending=False)
.loc[
lambda df: df["champion"].isin(
df.groupby("champion")["picks"].sum().nlargest(20).index
)
]
.assign(
position=lambda df: pd.Categorical(
df["position"],
categories=["top", "jng", "mid", "bot", "sup"],
ordered=True,
)
)
.sort_values("position")
),
x="champion",
y="picks",
color="position",
title="Top 20 Most Picked Champions",
labels={"champion": "Champion", "picks": "Number of Picks", "position": "Role"},
color_discrete_map=role_colors,
category_orders={"position": ["top", "jng", "mid", "bot", "sup"]},
template="plotly_dark",
width=650,
height=600
).update_layout(
xaxis={
"categoryorder": "array",
"categoryarray": (
players.groupby(["champion", "position"])["champion"]
.count()
.reset_index(name="picks")
.groupby("champion")["picks"]
.sum()
.sort_values(ascending=False)
.head(20)
.index
),
"title": {"font": {"family": "OpenSansRegular"}},
"tickfont": {"family": "OpenSansRegular"},
"tickangle": 45
},
yaxis={
"title": {"font": {"family": "OpenSansRegular"}},
"tickfont": {"family": "OpenSansRegular"},
"range": [0, None]
},
paper_bgcolor='#0A1428',
plot_bgcolor='#0A1428',
title={
"font": {"family": "Beaufort"},
"y": 0.95
},
font=dict(
family="OpenSansRegular",
color='#f0e6d2'
),
margin=dict(b=100),
showlegend=True,
legend_title_text="Typical Role",
legend=dict(
yanchor="top",
y=0.99,
xanchor="right",
x=0.99,
bgcolor='rgba(10,20,40,0.8)',
bordercolor='#f0e6d2'
)
)
for i, role in enumerate(role_names.keys()):
fig.data[i].name = role_names[role]
fig.show()
fig.write_html('charts/top-20-champions.html', include_plotlyjs='cdn')
The data reveals Nautilus as overwhelmingly the most picked champion of all time in professional play, followed by Ezreal and Braum. This immediately shows how specific champions are prioritized over others due to strategy—in particular:
- Nautilus' kit provides strong engage tools and utility that allow him to set up team fights and engage targets of opportunity, which are highly valuable throughout all meta.
- Ezreal is a very difficult character with a high skill ceiling, which might reap benefits in professional play.
We can also take a look at the most banned champions across the entire dataset. Banning is typically a strategy done to either remove a champion that synergizes well with the enemy team's playstyle/composition (or to remove the "main" champion that the enemy's star player/carry is likely to pick), or to remove a champion that is simply either annoying or overpowered in the current meta:
ban_data = pd.concat([
pd.Series([ban for bans in matches["blue_bans"] for ban in bans]),
pd.Series([ban for bans in matches["red_bans"] for ban in bans]),
])
role_data = players.groupby(['champion', 'position']).size().reset_index(name='count')
most_common_roles = role_data.sort_values('count', ascending=False).groupby('champion').first()
top_20_bans = ban_data.value_counts().head(20)
colors = [role_colors.get(most_common_roles.loc[champ, 'position'], '#C8AA6E')
if champ in most_common_roles.index else '#C8AA6E'
for champ in top_20_bans.index]
fig = px.bar(
x=top_20_bans.index,
y=top_20_bans.values,
title="Top 20 Most Banned Champions",
labels={"x": "Champion", "y": "Number of Bans"},
template="plotly_dark",
width=650,
height=600,
).update_traces(marker_color=colors).update_layout(
xaxis={
"title": {"font": {"family": "OpenSansRegular"}},
"tickfont": {"family": "OpenSansRegular"},
"tickangle": 45
},
yaxis={
"title": {"font": {"family": "OpenSansRegular"}},
"tickfont": {"family": "OpenSansRegular"},
"range": [0, None]
},
paper_bgcolor='#0A1428',
plot_bgcolor='#0A1428',
title={
"font": {"family": "Beaufort"},
"y": 0.95
},
font=dict(
family="OpenSansRegular",
color='#f0e6d2'
),
margin=dict(b=100),
showlegend=True,
legend_title_text="Typical Role",
legend=dict(
yanchor="top",
y=0.99,
xanchor="right",
x=0.99,
bgcolor='rgba(10,20,40,0.8)',
bordercolor='#f0e6d2'
)
)
for role, color in role_colors.items():
fig.add_trace(go.Bar(
x=[None],
y=[None],
name=role_names[role],
marker_color=color,
showlegend=True
))
fig.show()
fig.write_html('charts/top-20-bans.html', include_plotlyjs='cdn')
LeBlanc is, by far and away, the most banned champion of all time. Although these results may not be indicative of the current meta, it seems as if at some point (or consistently), LeBlanc was a non-negotiable ban in professional play.
Bivariate Analysis¶
Moving onto bivariate analysis, we can take a look at the distribution of gold earned by each role across the dataset. In League of Legends, there are five primary roles that players assume, each with distinct responsibilities and strategic importance:
| Role | Description |
|---|---|
| Top | The top laner is positioned in the top lane of the map. This player typically uses "tank" champions (characters that can absorb a lot of damage) or "bruisers" (characters that deal and withstand damage), who can initiate fights. They often play champions that excel in split-pushing (applying pressure on the map by attacking enemy structures while the rest of the team is elsewhere). |
| Jungle | The jungler does not stay in a fixed lane but instead moves around the map, killing neutral monsters for gold and experience. This role is vital for map control, securing objectives like Dragon and Baron (powerful neutral monsters that provide team-wide benefits when defeated), and assisting other lanes by "ganking" (surprising enemy players in their lanes to help secure kills in outnumbered fights). |
| Mid | The mid laner occupies the central lane and is crucial for controlling the map's center. This role usually involves playing "mages" (characters that use magic to deal damage) or "assassins" (characters that can quickly eliminate opponents), champions that deal significant damage and can roam to other lanes to assist teammates in securing kills. |
| Bot | The bottom lane consists of two players: the ADC (Attack Damage Carry, responsible for dealing consistent physical damage, especially in the late game) and the Support (provides utility, vision, and protection for the ADC). The bot lane is a key area for team coordination and strategy. |
| Support | The Support is part of the bottom lane duo and focuses on protecting the ADC. This role involves providing vision control with wards (items that reveal areas of the map), engaging or disengaging fights, and playing champions with crowd control abilities (skills that impair enemy movement or actions) and healing or shielding capabilities. The Support is essential for team fights and overall map awareness. |
We can start with the gold distribution by role. Gold distribution is a key indicator of team resource allocation strategies. Different roles have different gold requirements based on their function within the team composition, and understanding these patterns helps reveal how teams prioritize their resources:
position_labels = {pos: role_names[pos] for pos in players['position'].unique()}
fig = px.box(
players,
x="position",
y="totalgold",
title="Gold Distribution by Role",
labels={"position": "Role", "totalgold": "Total Gold"},
color="position",
color_discrete_map=role_colors,
category_orders={"position": list(position_labels.keys())},
template="plotly_dark",
width=650,
height=600
).update_layout(
paper_bgcolor='#0A1428',
plot_bgcolor='#0A1428',
title={
"font": {"family": "Beaufort"},
"y": 0.95
},
font=dict(
family="OpenSansRegular",
color='#f0e6d2'
),
xaxis={
"title": {"font": {"family": "OpenSansRegular"}},
"tickfont": {"family": "OpenSansRegular"},
"ticktext": list(position_labels.values()),
"tickvals": list(position_labels.keys())
},
yaxis={
"title": {"font": {"family": "OpenSansRegular"}},
"tickfont": {"family": "OpenSansRegular"}
},
showlegend=False
)
fig.show()